1.

| Loop: | lw   | \$p5  | 40(\$p6) |      | p5[p2]  |
|-------|------|-------|----------|------|---------|
|       | lw   | \$p8  | 60(\$p6) |      | p8[p3]  |
|       | add  | \$p9  | \$p8     | \$p5 | p9[p4]  |
|       | sll  | \$p10 | \$p5     | 4    | p10[p5] |
|       | SW   | \$p9  | 80(\$p7) |      |         |
|       | SW   | \$p10 | 40(\$p6) |      |         |
|       | addi | \$p11 | \$p6     | 4    | p11[p6] |
|       | addi | \$p12 | \$p7     | 4    | p12[p7] |
|       | addi | \$p13 | \$p1     | -1   | p13[p1] |
|       | bnez | \$p13 | loop     |      |         |

After renaming, we still have 6 true dependencies and only 5 instructions could be running in parallel. Renaming should begin from the first instruction. You should suppose false dependencies with previous not listed instructions.

## 2. FGMT

| Instructions                | Cycle |
|-----------------------------|-------|
| [1.11] [1.12] [1.13]        | 1     |
| [2.11] [2.12]               | 2     |
| [3.11]                      | 3     |
| [1.21] [1.22] [1.23] [1.24] | 4     |
| [2.21] [2.22] [2.23]        | 5     |
| [3.21] [3.22]               | 6     |

| [1.31] [1.32]               | 7  |
|-----------------------------|----|
| [2.31] [2.32] [2.33] [2.34] | 8  |
| [3.31] [3.32] [3.33]        | 9  |
| [1.41] [1.42]               | 10 |
| [2.41]                      | 11 |
| [3.41] [3.42]               | 12 |
| [1.51] [1.52] [1.53]        | 13 |
| [2.51][2.52]                | 14 |
| [3.61] [3.62] [3.63] [3.64] | 15 |
| [1.61] [1.62]               | 16 |
| [2.61][2.62]                | 17 |
| [3.71] [3.72] [3.73]        | 18 |
| [1.71] [1.72]               | 19 |
| [2.71]                      | 20 |
| [3.81]                      | 21 |

21 cycles for FGMT.

## SMT

| Instructions                | Cycle |
|-----------------------------|-------|
| [1.11] [1.12] [1.13] [2.11] | 1     |
| [2.12] [3.11] [1.21] [1.22] | 2     |
| [1.23] [1.24] [2.21] [2.22] | 3     |
| [2.23] [3.21] [3.22] [1.31] | 4     |

| [1.32] [3.31] [3.32] [3.33] | 5  |
|-----------------------------|----|
| [1.41] [1.42] [2.31] [2.32] | 6  |
| [2.33] [2.34] [1.51] [1.52] | 7  |
| [1.53] [2.41] [3.41] [3.42] | 8  |
| [2.51][2.52]                | 9  |
| [2.61][2.62] [3.61] [3.62]  | 10 |
| [1.61][1.62] [3.63] [3.64]  | 11 |
| [3.71] [3.72] [3.73] [2.71] | 12 |
| [1.71] [1.72] [3.81]        | 13 |

13 cycles for SMT

3. A1

**B**1

\*stall\*

B2

В3

A2

B4

A3

A4

Therefore, 13 cycles

1 slot is wasted.

4.

load-use cross-checks: 32(4-way) 128(8-way)

RAW intra-bundle dependency checks: 24(4-way) 112(8-way)

read and write ports: 8 4(4-way) 16 8(8-way)